Kevin Donovan
1/26/2021
Features: \[ X= \left(\begin{array}{c} X_1 \\ X_2 \\ ... \\ X_p \end{array}\right) \]
Model: \[Y = f(X) + \epsilon\]
Goals
Recall: \(MSE\) for estimate at \(X=x\) can be decomposed into \[MSE_{\hat{f}}(x)=E[(Y-\hat{f}(X))^2|X=x]=[f(x)-\hat{f}(x)]^2+Var(\epsilon)\]
Consider taking expectation marginally (i.e., across \(Y\) and \(X\)).
Can show \[E[(Y-\hat{f}(X))^2]=E_x[\text{bias}(\hat{f}(x))^2]+E_x[\text{Var}(\hat{f}(x))]+\text{Var}(\epsilon)\]
where \(\text{bias}(\hat{f}(x))=\text{E}[\hat{f}(x)]-f(x)\)
Above means bias and variance of model increases expected model error
Creates tradeoff:
Suppose instead response \(Y\) is categorical
e.g. cancer stage is one of \(C=(0, 1, 2, 3, 4)\) where \(0\) indicates cancer-free
Goals:
What to model?:
Let \(p_k(x)=\text{Pr}(Y=k|X=x)\), \(k=1,2,\ldots,K\)
Denoted as the conditional class probabilities at \(x\)
If these are known, can define classifier at \(x\) by
\(f(x)=j\) if \(p_j(x)=\text{max}[p_1(x), \ldots, p_K(x)]\)
Denoted as the Bayes optimal classifier at \(x\)
Basic:
\(\text{accuracy}=\frac{\text{# correct predictions}}{\text{# test instances}}\)
\(\text{error}=1-\text{accuracy}\)
These are in general not sufficient (why?)
Example
During the COVID-19 pandemic, different metrics to quantify risk:
## k-Nearest Neighbors
##
## 30 samples
## 2 predictor
##
## Pre-processing: centered (2), scaled (2)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 30, 30, 30, 30, 30, 30, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 4.585041 0.8918501 3.700367
## 7 4.845494 0.8979390 3.972826
## 9 5.044867 0.8978793 4.163350
## 11 5.408245 0.9069323 4.400432
## 13 6.119488 0.8964707 4.995359
## 15 6.908745 0.8930579 5.724303
## 17 7.706909 0.8906881 6.415243
## 19 8.613891 0.8744222 7.197625
## 21 9.406709 0.8592141 7.902947
## 23 10.066420 0.8578698 8.499900
## 25 10.842491 0.7900220 9.237286
## 27 11.829461 0.7020191 10.111490
## 29 12.466729 0.6103163 10.753089
## 31 12.744056 NaN 11.034206
## 33 12.744056 NaN 11.034206
## 35 12.744056 NaN 11.034206
## 37 12.744056 NaN 11.034206
## 39 12.744056 NaN 11.034206
## 41 12.744056 NaN 11.034206
## 43 12.744056 NaN 11.034206
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.
## k-Nearest Neighbors
##
## 30 samples
## 2 predictor
## 2 classes: 'no', 'yes'
##
## Pre-processing: centered (2), scaled (2)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 30, 30, 30, 30, 30, 30, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 5 0.8625986 0.7151352
## 7 0.8640695 0.7320667
## 9 0.8463797 0.6966267
## 11 0.8348877 0.6892596
## 13 0.8283119 0.6807602
## 15 0.8381627 0.7011166
## 17 0.7941796 0.6165274
## 19 0.7428680 0.5514878
## 21 0.6846963 0.4460085
## 23 0.6910537 0.4625074
## 25 0.5882703 0.2921750
## 27 0.4964599 0.1476145
## 29 0.4181164 0.0000000
## 31 0.4181164 0.0000000
## 33 0.4181164 0.0000000
## 35 0.4181164 0.0000000
## 37 0.4181164 0.0000000
## 39 0.4181164 0.0000000
## 41 0.4181164 0.0000000
## 43 0.4181164 0.0000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 7.